Predicting Lexical Norms Using a Word Association Corpus
نویسندگان
چکیده
Obtaining norm scores for subjective properties of words can be quite cumbersome as it requires a considerable investment proportional to the size of the word set. We present a method to predict norm scores for large word sets from a word association corpus. We use similarities between word pairs, derived from this corpus, to construct a semantic space. Starting from norm scores for a subset of the words, we retrieve the direction in the space that optimally reflects the norm data associated with the words. This direction is used to orthogonally project all the other words in the semantic space on, providing predictions of the words on the variable of interest. In this study, we predict valence, arousal, dominance, age of acquisition, and concreteness and show that the predictions correlate strongly with the judgments of human raters. Furthermore, we show that our predictions are superior to those derived using other methods.
منابع مشابه
Comparing Lexical Relationships Observed within Japanese Collocation Data and Japanese Word Association Norms
While large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of the lexical relationships observed within Japanese collocation data extracted from a large corpus using the Japanese language version of the Sketc...
متن کاملCollecting and Exploring Everyday Language for Predicting Psycholinguistic Properties of Words
Exploring language usage through frequency analysis in large corpora is a defining feature in most recent work in corpus and computational linguistics. From a psycholinguistic perspective, however, the corpora used in these contributions are often not representative of language usage: they are either domain-specific, limited in size, or extracted from unreliable sources. In an effort to address...
متن کاملColing 2008 22 nd International Conference on Computational Linguistics
While large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of the lexical relationships observed within Japanese collocation data extracted from a large corpus using the Japanese language version of the Sketc...
متن کاملUsing Web Corpora for the Automatic Acquisition of Lexical-Semantic Knowledge
This article presents two case studies to explore whether and how web corpora can be used to automatically acquire lexical-semantic knowledge from distributional information. For this purpose, we compare three German web corpora and a traditional newspaper corpus on modelling two types of semantic relatedness: (1) Assuming that free word associations are semantically related to their stimuli, w...
متن کاملDeveloping a Corpus-Based Word List in Pharmacy Research Articles: A Focus on Academic Culture
The present corpus-based lexical study reports the development of a Pharmacy Academic Word List (PAWL); a list of the most frequent words from a corpus of 3,458,445 tokens made up of 800 most recent pharmacy texts including research articles, review articles, and short communications in four sub-disciplines of pharmacy. WordSmith (Scott, 2017) and AntWordProfiler (Anthony, 2014) were used to sc...
متن کامل